# Long Sequence Processing

Modernpubmedbert
Apache-2.0
A sentence transformer model trained on the PubMed dataset, supporting multiple embedding dimensions, suitable for biomedical text processing.
Text Embedding
M
lokeshch19
380
2
Ruri V3 30m
Apache-2.0
Ruri v3 is a Japanese general-purpose text embedding model based on ModernBERT-Ja, supporting sequence processing of up to 8192 tokens and delivering top-tier performance in Japanese text embedding tasks.
Text Embedding Japanese
R
cl-nagoya
1,135
3
Sapnous VR 6B
Apache-2.0
Sapnous-6B is an advanced vision-language model that enhances perception and understanding of the world through powerful multimodal capabilities.
Image-to-Text Transformers English
S
Sapnous-AI
261
5
Fanformer 1B
MIT
FANformer-1B is an autoregressive model enhanced with innovative periodic mechanisms for language modeling, featuring 1.1 billion non-embedding parameters and trained on 1 trillion tokens.
Large Language Model Transformers English
F
dongyh
114
2
Codemodernbert Owl
Apache-2.0
CodeModernBERT-Owl is a model pre-trained from scratch, specifically designed for code retrieval and code understanding tasks, supporting multiple programming languages and improving retrieval accuracy.
Text Embedding Supports Multiple Languages
C
Shuu12121
285
5
Zamba 7B V1 Phase1
Apache-2.0
Zamba-7B-v1-phase1 is a hybrid architecture combining state space model Mamba with Transformer, using Mamba as the backbone network and sharing one Transformer layer every six modules, trained via next-token prediction.
Large Language Model Transformers
Z
Zyphra
22
5
Bert Large Cantonese
A large-scale BERT model trained from scratch on Cantonese text, suitable for masked language modeling tasks in Cantonese
Large Language Model Transformers Other
B
hon9kon9ize
448
8
Mistral Supra
Apache-2.0
Mistral-SUPRA is a linear RNN model initialized based on Mistral-7B, combining the functions of Transformer and recurrent models.
Large Language Model PyTorch English
M
TRI-ML
163
12
Saul Instruct V1 GGUF
MIT
Saul-Instruct-v1-GGUF is the GGUF format version of the Equall/Saul-Instruct-v1 model, suitable for text generation tasks and supports multiple quantization levels.
Large Language Model English
S
MaziyarPanahi
456
8
Mamba 790m Hf
Mamba is an efficient sequence model compatible with transformers, with 790 million parameters, suitable for causal language modeling tasks.
Large Language Model Transformers
M
state-spaces
6,897
4
Mamba 130m Hf
Mamba is a transformer-compatible sequence modeling model with efficient inference capabilities.
Large Language Model Transformers
M
state-spaces
46.83k
56
Mamba 1.4b Hf
Mamba is an efficient language model based on the State Space Model (SSM) architecture, with 1.4B parameters, supporting text generation tasks
Large Language Model Transformers
M
state-spaces
5,431
11
Rank Zephyr 7b V1 Full GGUF
MIT
A text ranking model based on Mistral-7B, offering multiple quantized versions for efficient inference.
Large Language Model English
R
MaziyarPanahi
708
5
Mixtral 8x7B V0.1 GGUF
Apache-2.0
GGUF quantized version of Mixtral-8x7B-v0.1, supporting multiple bit quantization, suitable for text generation tasks.
Large Language Model Supports Multiple Languages
M
MaziyarPanahi
128
1
Sauerkrautlm 7b HerO Mistral 7B Instruct V0.1 GGUF
Apache-2.0
This is a German/English bilingual model fine-tuned based on Mistral-7B-Instruct-v0.1, quantized in GGUF format with support for multiple quantization levels from 2 to 8 bits.
Large Language Model Supports Multiple Languages
S
MaziyarPanahi
90
2
Mamba 1B
Apache-2.0
Mamba-1B is a 1B-parameter language model based on the Mamba architecture, supporting English text generation tasks.
Large Language Model Transformers English
M
Q-bert
185
28
Dolphin 2.5 Mixtral 8x7b GPTQ
Apache-2.0
Dolphin 2.5 Mixtral 8X7B is a large language model developed by Eric Hartford based on the Mixtral architecture, fine-tuned on multiple high-quality datasets, suitable for various natural language processing tasks.
Large Language Model Transformers English
D
TheBloke
164
112
Mixtral 8x7B Instruct V0.1 HF
Apache-2.0
Mixtral-8x7B is a pre-trained generative sparse mixture of experts large language model that outperforms Llama 2 70B on most benchmarks.
Large Language Model Transformers Supports Multiple Languages
M
LoneStriker
45
4
Jais 30b V1
Apache-2.0
JAIS-30B is a 30-billion-parameter bilingual (Arabic and English) large language model based on the GPT-3 architecture, utilizing ALiBi positional embedding technology, achieving state-of-the-art performance in Arabic tasks.
Large Language Model Transformers Supports Multiple Languages
J
inceptionai
37
23
Llava V1.5 13B GPTQ
Llava v1.5 13B is a multimodal model developed by Haotian Liu, combining visual and language capabilities to understand and generate content based on images and text.
Text-to-Image Transformers
L
TheBloke
131
37
Jais 13B 8bit
Apache-2.0
13 billion parameter Arabic-English bilingual large language model based on Transformer architecture, supporting long sequence processing
Large Language Model Transformers Supports Multiple Languages
J
asas-ai
72
9
Codellama 34B Instruct GPTQ
CodeLlama 34B Instruct is a 34-billion-parameter code generation model released by Meta, based on the Llama 2 architecture, specifically fine-tuned for programming tasks.
Large Language Model Transformers Other
C
TheBloke
174
75
Nystromformer 4096
Long-sequence Nyströmformer model trained on WikiText-103 v1 dataset, supports sequence processing up to 4096 tokens
Large Language Model Transformers
N
uw-madison
74
3
Nystromformer 2048
Nystromformer model trained on the WikiText-103 dataset, supporting long sequence processing (2048 tokens)
Large Language Model Transformers
N
uw-madison
38
1
20220415 210530
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-2b on the common_voice dataset
Speech Recognition Transformers
2
lilitket
20
0
Cpt Large
A pre-trained unbalanced Transformer model for Chinese understanding and generation, supporting various natural language processing tasks
Large Language Model Transformers Chinese
C
fnlp
122
16
Nystromformer 512
An efficient Transformer model optimized with the Nyström method for handling long-sequence tasks
Large Language Model Transformers
N
uw-madison
1,570
2
Language Perceiver
Apache-2.0
Pre-trained on BERT-style masked language modeling tasks, supports multimodal Transformer model processing UTF-8 byte inputs
Large Language Model Transformers English
L
deepmind
9,840
20
Bigbird Roberta Large
Apache-2.0
BigBird is a Transformer model based on sparse attention, capable of processing sequences up to 4096 tokens long, suitable for long document tasks.
Large Language Model English
B
google
1,152
27
Cpt Base
Asymmetric pre-trained Transformer model for Chinese comprehension and generation tasks
Large Language Model Transformers Chinese
C
fnlp
37
14
Biobert Large Cased V1.1 Squad
BioBERT is a BERT-based pretrained language model specifically optimized for biomedical text mining tasks, suitable for question answering systems.
Question Answering System
B
dmis-lab
1,227
18
Yoso 4096
YOSO is an efficient Transformer variant that reduces the self-attention complexity from quadratic to linear through Bernoulli sampling attention mechanism, supporting sequence lengths up to 4096.
Large Language Model Transformers
Y
uw-madison
2,072
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase